Optimal Diversity-Multiplexing Tradeoff 
in Selective-Fading MIMO Channels 



Pedro Coronel and Helmut Bolcskei 

Communication Technology Laboratory 
ETH Zurich, 8092 Zurich, Switzerland 
E-mail: {pco, boelcskei}@ nari.ee.ethz.cn 

O ' 



Abstract 

We establish the optimal diversity-multiplexing (DM) tradeoff of coherent time, frequency, and 
time-frequency selective-fading multiple-input multiple-output (MIMO) channels and provide a code 
, design criterion for DM tradeoff optimality. Our results are based on the new concept of the "Jensen 

channel" associated to a given selective-fading MIMO channel. While the original problem seems 
analytically intractable due to the mutual information between channel input and output being a sum 
of correlated random variables, the Jensen channel is equivalent to the original channel in the sense of 

on : 

, the DM tradeoff and lends itself nicely to analytical treatment. We formulate a systematic procedure for 

■ designing DM tradeoff optimal codes for general selective-fading MIMO channels by demonstrating 

C^) ' that the design problem can be separated into two simpler and independent problems: the design of 

| an inner code, or precoder, adapted to the channel statistics (i.e., the selectivity characteristics) and 

an outer code independent of the channel statistics. Our results are supported by appealing geometric 
• intuition, first pointed out for the flat-fading case by Zheng and Tse, IEEE Trans. Inf. Theory, 2003. 

.«?; 

I. Introduction 

The diversity-multiplexing (DM) tradeoff framework introduced by Zheng and Tse [1] al- 
lows to efficiently characterize the high-SNR rate-reliability tradeoff for communication over 
multiple-input multiple-output (MIMO) fading channels. The optimal DM tradeoff for flat- 
fading MIMO channels was characterized in [1]. Sparked by [1] a number of DM tradeoff 

Part of this work was performed while the first author was with IBM Research, Zurich Research Laboratory, Switzerland. 
This work was supported by the STREP project No. IST-026905 MASCOT within the Sixth Framework Programme of the 
European Commission. This paper was presented in part at the IEEE Int. Symp. Inf. Theory (ISIT), June 2007, Nice, France. 
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optimal coding/decoding schemes for the flat-fading case were reported during the past few 
years. In particular, the non-vanishing determinant criterion [2], [3] on codeword difference 
matrices has been shown to constitute a sufficient condition for DM tradeoff optimality [3], [4]; 
this criterion has led to the construction of DM tradeoff optimal space-time codes based on con- 
stellation rotation [3], [5] and cyclic division algebras [4], [6]. Lattice-based space-time codes 
have been shown to be DM tradeoff optimal in [7]. The DM tradeoff optimality of approximately 
universal space-time codes was established in [8]. 

Contributions: While the results mentioned above focus on frequency-flat block-fading chan- 
nels, extensions to frequency- selective channels can be found for the single- antenna case in 
[9], and for the MIMO case in [10]. However, a general characterization of the optimal DM 
tradeoff in time, frequency, or time-frequency selective-fading MIMO channels, in the following 
simply referred to as selective-fading MIMO channels, does not seem to be available to date. 
The present paper resolves this problem for the coherent caseQ, provides a code design crite- 
rion guaranteeing DM tradeoff optimality, and introduces a systematic procedure for designing 
DM tradeoff optimal codes. Our results are based on upper and lower bounds on the mutual 
information of selective-fading MIMO channels; these bounds are shown to exhibit the same 
DM tradeoff behavior. In particular, we prove that for a given selective-fading MIMO channel 
the optimal DM tradeoff curve can be obtained by solving the analytically tractable problem 
of computing the DM tradeoff curve corresponding to its associated "Jensen channel". We 
demonstrate that the problem of designing DM tradeoff optimal codes can be separated into 
two simpler and independent problems: the design of an inner code, or precoder, adapted to 
the channel statistics (i.e., selectivity characteristics) and an outer code independent of the 
channel statistics. The inner code can be obtained in a systematic fashion as a function of the 
channel statistics. The design criterion for the outer code is standard with corresponding designs 
available in the literature. 

Notation: M T and M R denote the number of transmit and receive antennas, respectively. We 
set m = min(MT, Mr) and M = max(Mx, Mr). For i 6 I, we let [x] + = max (0, x). We 
denote the nonnegative m-dimensional orthant by W?. The superscripts T , H , and * stand for 
transposition, conjugate transposition, and complex conjugation, respectively. I n is the n x n 
identity matrix, t n is the n x n all ones matrix, A <g> B and A B denote, respectively, the 

'Throughout the paper, we assume that the receiver has perfect channel state information (CSI) and the transmitter does not 
have CSI, but is aware of the channel law. 
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Kronecker and Hadamard products of the matrices A and B, and A y B stands for positive 
semidefinite ordering. Matrix multiplication has priority over the Kronecker product and 
the Hadamard product 0, so that we will write, e.g., A BC for A (BC). A 1 / 2 denotes 
the (unique) positive semidefinite square root of the positive semidefinite matrix A. For the 
nxm matrices A k (k = 1, . . . , K), diag{ A k }^ =l denotes the nK x rriK block-diagonal matrix 
with the kth diagonal entry given by A&. If S is a set, |<S| denotes its cardinality. A(«Si,«S2) 
stands for the (sub)matrix consisting of the rows of A indexed by <Si and the columns of A 
indexed by S 2 . The columns and rows of the nxm matrix A are denoted, respectively, by 
a k = [A(l, k) ■■■ A(n, k)] T (k = 1, . . . , m) and a (p) = [A(p, 1) ■ ■ • A(p, m)] (p = 1, . . . , n); 
vec(A) = [af • ■ ■ a^] T . For annxl vector a = [a x ■ • • a n ] T , D a = diag{a m }™ =1 , and a(m) 
refers to a m . The n x n FFT matrix \I/ is given by *f?(k, I) = -^e"^^" 1 '^ 1 ) (k, I — 1, . . . , n). 
The determinant, trace, and rank of A are denoted as det (A) , Tr (A) , and rank( A) , respectively, 
and || A||p = Tr (AA ff ). The nonzero eigenvalues of the n x n Hermitian matrix A, sorted in 
ascending order, are designated as Afe(A), k = 1, . . . , rank(A). The Kronecker delta function 
is defined as 5 m , n = 1 for m = n and zero otherwise. If X and Y are random variables (RVs), 
X ~ Y denotes equivalence in distribution, and Ex is the expectation operator with respect to 
(w.r.t.) the RVX. The random vector x ~ CA/"(/x, C) is jointly proper Gaussian (JPG) with mean 
fi and covariance matrix C. The inner product between two signals u{t) and v{t) is denoted 
as (it, v) = u(t)v*(t)dt. The functions f(x) and g(x) are said to be exponentially equal, 
denoted by f(x) = g(x), if lirn^oo l °fJ g ^ = lim^oo l °i 9 ^ ■ Exponential inequality, denoted 
by > and <, is defined analogously. 



A. Channel model 

A time-frequency selective single-input single-output (SISO) channel can be modeled as a 
stochastic linear time- varying (LTV) system [11] with (noise-free) input-output (I/O) relation 



where x(t) is the input signal, r(t) is the output signal, and the effect of the channel is described 
by the linear operator HI with random kernel ku(t, t'). The time-varying impulse response de- 
fined as h m (t, t) = k m (t, t — t) yields the equivalent (noise-free) I/O-relation 



II. Channel and signal model 





(1) 



July 14, 2009 



DRAFT 



Two additional system functions that will be important in the ensuing developments are the 
time-varying transfer function 

Lw(tJ) = Jh u {t,r)e-^ T dr (2) 

and the spreading function 

S m (r,v) = £h m (t,r)e- j27TUt dt. (3) 
As an alternative to ([T]), we may write the I/O-relation in terms of the spreading function as 

r(t)= [ [ S M (T,v)x(t-T)e j2nut dTdv. (4) 



The output signal is thus a weighted superposition of time-frequency shifted replicas of the 
input signal x(t), where the shifts are parametrized by delay r and Doppler shift v and 5h(t, v) 
corresponds to the weighting function. 

Statistical characterization: The channel impulse response hm(t, r) is a zero-mean JPG pro- 
cess which is wide-sense stationary in time t and uncorrected in delay r, i.e., it satisfies the 
wide-sense stationary uncorrelated- scattering (WSSUS) assumption [11] 

E{h m {t, r)h* u (t', r')} = jm{t - t', r)5(r - r'). 

Hence, the time-delay correlation function ja{t, t) fully characterizes the channel statistics. The 
WSSUS property implies that Lm(t, f) is wide-sense stationary in both t and /, and Sh(t, v) is 
uncorrelated in delay r and Doppler v: 

E{L m (t, f)L* n (t', f)} = R m (t -t'J- /') 
E{S m (r, u)S^(t', u')} = C H (r, u)8{r - r')5{u - u') 

where the scattering function Ch(t, v) and the time-frequency correlation function R m (t—t', f— 
f) are related through a two-dimensional Fourier transform according to 

C H (r, v) = J J R m (t, f)e-^-^dt df. (5) 

Because Ru(t, f) is stationary in t and /, Ch(t, v) is a real-valued and nonnegative function 
that can be interpreted as the spectrum of the channel process. 

The underspread assumption and its consequences: We assume that the channel operator H 
is underspread [12] so that the scattering function Ch(t, v) is compactly supported within the 
rectangle [0,r ] x [0, u ], i.e., 

C H (r, v) = for (r, u) £ [0, r ] x [0, u ] 
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with the total channel spread A H = t z/ satisfying A H < 1. Note that this implies that the 
spreading function Sh(t, v) is also supported in this rectangle with probability 1 (w.p.l). The 
underspread assumption is relevant as most mobile radio channels are (in fact highly) under- 
spread. Moreover, underspread channels have a set of approximate deterministic and structured 
eigenfunctions which allows to discretize the I/O-relation © as described next. 

B. Signaling on approximate eigenfunctions of the channel 

We build our developments on the fact that underspread channels are approximately diago- 
nalized by orthogonal Weyl-Heisenberg bases [12] that are obtained by time-frequency shifting 
a prototype pulse g(t) according to 

g m ,k{t) = 9(t - mT)e^ kFt 
where the grid parameters T and F satisfy TF > 1 and the basis {g m ,k(t)} is orthonormal, i.e., 

(gm,k,g n , P ) = J g m> k(t)g*, p (t)dt = S m>n S k , p . (6) 

Details on the choice of g(t) can be found in [13]. For grid parameters chosen so that T < i 
and F < and hence TF < 1/Ah, it has been shown in [12], [13] that the impulse response 
of the underspread fading channel can be well approximated by setting 

oo oo 

k m (t,t')= Yl L m (mT,kF)g m!k (t)g* mtk (t') (7) 

m=— oo k=~co 

where the samples of the time- varying transfer function Le(mT, kF) are — as a consequence of 
the assumption on h^(t, r) being a zero-mean JPG process — JPG random variables with zero 
mean and correlation function 

E{L H (mT, kF)L* R (nT,pF)} = R R ((m - n)T, (k - p)F). (8) 

The variance of each channel coefficient Le(mT, kF) follows from © as 

a m= Cu(t, v)drdv. 



Canonical characterization of signaling schemes: Based on the developments in the previ- 
ous paragraph, we construct the transmit signal as a linear combination of the (approximate) 
eigenfunctions of the channel operator according to 



oo K-l 
m=—oo k=0 



(t) = Y ^ x m>k g m>k (t) (9) 
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where the x m fc are the information bearing (complex-valued) data symbols. This modulation 
scheme corresponds to pulse-shaped orthogonal frequency-division multiplexing (OFDM) with 
symbol duration T, tone spacing F, and effective signal bandwidth W = KF. The receiver 
computes the inner products y myk = (y, g m ,k), where y(t) = r(t)+z(t) and z(t) is additive white 
Gaussian noise with E{z(£)z*(£')} = 5(t — t'). Introducing the normalization x mfc = i^ vt, 
with SNR denoting the average signal-to-noise ratio, the overall I/O-relation is given by 



y m . k = VSNR L m (mT, kF)x m . k + z m , k (10) 

where, due to the orthonormality of the basis functions {g m> k(t)}, the random variables z m ^ k = 
(z, g m ,k) are independent and identically distributed (i.i.d.) across m and k, and satisfy z m ^ ~ 
£/V(0, 1), for all m and k. In essence, this scheme corresponds to transmitting and receiving on 
the channel's eigenfunctions and, hence, leads to a diagonalization of the channel. For details 
on the discretization of the I/O-relation © described above the interested reader is referred to 
[13]. 

C. Input-output relation with multiple antennas 

We assume that communication takes place over M time slots and K frequency slots. For 
the sake of simplicity of notation, we introduce the bijective mapping Ai, defined as 

M: {0,...,M-l}x{0,...,K-l} — > {0,...,JV-1} 

(m, k) i — > n = mK + k 
to index the time-frequency slots (m, k) in (flOl) according to n = A4(m, k). We extend the 
I/O-relation (flOl) to the MIMO case assuming M T transmit and M R receive antennas, with the 
scalar subchannels of the M R x M T MIMO channel having statistically independent kernels 
with identical statistics, i.e., with identical scattering functions. Consequently, all subchannels 
are approximately diagonalized by the same Weyl-Heisenberg basis so that, based on (fTPl) and 
the mapping in (fTTT) . we get 



y n = J H n x n + z n , n = 0,...,N -1 (12) 

where SNR is the average signal-to-noise ratio at each receive antenna, y„, x n , and z„ denote, 
respectively, the corresponding Mr x 1 receive signal vector, Mt x 1 transmit signal vector, 
and Mr x 1 JPG noise vector satisfying z n ~ CN"(0, Im r )> and the channel matrices are 
given by H n (i,j) = L^\mT, kF) (i = 1, . . . , M R , j = 1, . . . , M T ), where the superscript 
designates the time-varying transfer function corresponding to the subchannel between 
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transmit antenna j and receive antenna i. In the sequel, we shall use X = [x ■ ■ ■ Xjv-i] and 
Y = [yo • ■ ■ yjv-i] to denote the transmit codeword matrix and the received signal matrix, 
respectively. 

Because the scalar subchannels are assumed to have statistically independent kernels with 
identical statistics, the channel matrices are spatially uncorrected and the correlation across 
slots is given by the time-frequency correlation function in ®. In particular, for any two time- 
frequency slots n = A4(m, k) and n' = Ai(m', k'), where n, n' e {0, . . . , N — 1}, we have 

E{H n (i, j)(H nl (i, j))*} = R m ((m - m')T, (k - k')F) (13) 

for i = 1, . . . , M R and j = 1, . . . , M T . For later use, we define the corresponding N x N 
covariance matrix Re as 

R H (n, ri) = R m ((m - m')T, (k - k')F) (14) 

and the stacked channel matrix H = [H • • ■ Hjv_i]. Note that with the notation and assumptions 
in place, we have 

E{vec(H)(vec(H))"} = R H ® I Mt m r . (15) 

The I/O-relation (fT2)) and the channel correlation function (IT~3l) are obtained using a signal- 
ing scheme that (approximately) diagonalizes the time-frequency selective channel. We stress, 
however, that (TT2l) is a general I/O-relation that encompasses other widely used models, as for 
example those in [14], [15, Ch. 3, Sec. 2] used to characterize linear frequency-invariant (LFI) 
channels and the cyclic signal model resulting from the use of OFDM modulation over linear 
time-invariant (LTI) channels [16]. The results developed in this paper therefore apply to these 
models as well provided one takes into account the corresponding structural differences in the 
covariance matrix CHI). We will particularize the main results in this paper to the most important 
instances of the models used in [14]— [16]. 

III. Diversity-multiplexing tradeoff 

A. Preliminaries 

When the receiver has perfect CSI, as assumed in this paper, the input distribution that 
maximizes the mutual information is the Gaussian distribution. Assuming that 

E|vec(X)(vec(X)) H | = Q 
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where Q has dimension NM T x NM T , the maximum mutual information corresponding to the 
channel in (PT2l) is obtained for vec(X) ~ CAf(0, Q), and is given by 

1 , . /, SNR 



J(Y; X|D H ) = ^ log det \l+ D H Q D£J (16) 

where D H = diag{H n }^~^ . For an average power constraint, specifically Tr (Q) < NM T , 
the outage probability at data rate R follows from (fT6l) by optimizing over the input covariance 
matrix as 

P aat {R)= inf pf^logdetfl + ^DHQDg) < r) . (17) 

Q^O,Tr(Q) < 7VM T \N \ M T J J 

The outage probability is of particular importance for the characterization of the rate-reliability 
tradeoff because it constitutes a fundamental limit on the error probability. Before proceeding 
with the analysis of (flTl) . we recall a central concept in the DM tradeoff framework. 

A family of codes C r [1] is a sequence of codebooks C r (SNR) parametrized by SNR and with 
fixed block length. At a given SNR, the corresponding codebookC r (SNR) contains SNR^ code- 
words, implying that the data rate i?(SNR) scales with SNR according to i?(SNR) = r log SNR. 
We say that C r operates at multiplexing rate r £ [0, m]. The multiplexing rate r represents the 
fraction of the ergodic channel capacity that C r operates at as SNR increases. The DM tradeoff 
realized by the family of codes C r is characterized by the function 

d(Cr) = - lim (18) 

V rJ SNR^oo log SNR 

where P e (C r ) is the error probability obtained through maximum-likelihood (ML) detection. 
Moreover, the optimal DM tradeoff curve 

d\r) = sup d(C r ) (19) 

Cr 

quantifies the maximum achievable diversity gain over all families (w.r.t. SNR) of codes that 
operate at multiplexing rate r. 

Following the arguments that lead to [1, Eq. (9)], we shall next show that choosing Q = I 
is DM tradeoff optimal in the selective-fading case as well. More specifically, we demonstrate 
that Q = I solves the optimization problem in (1171) in the high-SNR limit. First, we note that an 
upperboundonP out (_R) can be obtained by setting Q = I. On the other hand, because Q satisfies 
the power constraint Tr (Q) < ]VMt, we necessarily have Q ^ A^MtI. Since logdet(A) is 
increasing on the cone of positive definite matrices A [17, p. Ill], replacing Q by iVM T I in 
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(1161) increases the mutual information, and hence yields a lower bound on P out (R). Combining 
these arguments, we get 

P^^logdet(l + SNRiVH n H*) < R \ < P 0Ut (R) 

< P £ log det ^1 + — H„H« J < R j . (20) 

Noting that the upper and lower bounds in (l20l) differ only by a constant factor multiplying the 
SNR, and using the fact that 

logP(> En=o lQ g det (I + c SNRH n H^) < Ft) 

lim — 

SNR^oo log SNR 

logP(^ En=o logdet (I + c SNRH n H^) < r) 

= lim — (21) 

snr-^oo log(cSNR) 

logP(^ En="o log det (I + SNRH„H^) < i?) 
= lim — 

SNR^oo log SNR 

for any c G M+ independent of SNR, we get 

P out (R) = P(I(SNR) < i?) (22) 

where 

1 w 1 / SNR \ 

I(SNR) = - £ log det I + — H„H^ . (23) 

n=0 V T / 

The outage probability can be characterized in terms of the "singularity levels" of the channel 
matrices defined as 

logA fc (H„H^) 

/Vfe = ; — F1T75 > n = 0, . . . , iV - 1, fc = 1, . . . , m. (24) 

log SNR 

Rewriting (|23T) in terms of the singularity levels and letting the data rate scale with SNR as 
i?(SNR) = r log SNR, it can be shown by applying [1, Th. 4] that 

P out (r log SNR) =P(O r ) (25) 

where 

O r =U n eR™,n = Q,...,N-l:-J2J2ll-fi n , k ] + <r l > (26) 

^ n=0 k=l ' 

with fi n = [/i n i ■ ■ • yU n>m ] T . In the high-SNR limit, the outage probability can be characterized 
through its SNR exponent given by 

logP .(rlogSNR) _ log P (q) 

° V 7 SNR^oo log SNR SNR^oo log SNR 
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where we used (|25T) . Unlike in the frequency-flat fading case treated in [1], computing d (r) 
for the selective-fading case seems analytically intractable with the main difficulty stemming 
from the fact that one has to deal with the sum of correlated (recall that the H n are correlated 
across n) terms in (|23T) . for which the joint distribution of the corresponding singularity levels 
in (l24l) is in general unknown. It turns out, however, that one can find lower and upper bounds 
on I(SNR) in (1231) which are exponentially tight in SNR (and, hence, preserve the DM tradeoff 
behavior) and analytically tractable. The next section formalizes this idea. 
Throughout the paper, we shall enforce the peak power constraint 

||X||p < iVM T , VX e C r (SNR). (28) 



The families of codes C r that satisfy the power constraint (1281) constitute a subset of the families 
of codes satisfying the average power constraint induced by Tr (Q) < iVM T and based on which 
the outage probability in (fTTT) was formulated; it will become manifest, however, that in the high- 
SNR limit one can find families of codes that satisfy the more restrictive power constraint (1281) 
and still exhibit an error probability that is asymptotically equal to the outage probability. The 
power constraint (1281) implies that the vectorized codeword matrices, i.e., vec(X), of any (w.r.t. 
SNR) codebookC r (SNR) lie inside a sphere of radius \J 7VMt in C MtN centered at the origin. As 
this sphere radius is constant w.r.t. SNR, its interior becomes increasingly packed with codeword 
matrices as SNR grows (the codebook size increases according to |C r (SNR)| = SNR^ 7- to 
sustain the rate i?(SNR) = r log SNR). The codeword difference matrices E = X — X', with 
X, X' G C r (SNR), are, therefore, a function of SNR. For the sake of simplicity of notation, 
we do not make this dependency explicit. In the case N = 1 and M T = 1, for example, an 
admissible C r would be the family of quadrature amplitude modulation (QAM) constellations 
A given by 



2 SNR r / 2 SNR r//2 I 

J^(a + 3b),a,beZ: — < a, b < — ^— \ . (29) 

Note that ^l(SNR) has |*4(SNR) | = SNR r constellation points x satisfying the power constraint 
x 2 < 1. Consequently, the minimum distance in this family of codes scales as_| df nhl = SNR~ r , 
i.e., the area of the unit disk divided by the number of constellation points in *4.(SNR). 

2 A discussion of the DM tradeoff properties of QAM constellations for the scalar Rayleigh fading channel can be found in 
[15, Sec. 9.1.2], 
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B. Jensen channel and Jensen outage event 

We start by deriving a lower bound on outage probability obtained by upper-bounding the 
mutual information through Jensen's inequality applied as 



I(SNR) = -^logdetfl+— H re H*j <logdet(l+ — HH H \ ^ J(SNR) (30) 

n=0 V T / \ T / 

where the "Jensen channel" is an abstract channel characterized by the m x iVM matrix defined 

[H ■■■ H^J, ifM R <M T , 

n=l (3i) 

[[H,f H^J, ifM R >M T . 

In the following, we say that a Jensen outage J r occurs if the Jensen channel H. is in outage 
w.r.t. the rate R = r log SNR, i.e., if J(SNR) < R. The corresponding outage probability, 
Pj(R) = P(J(SNR) < R), clearly satisfies P 3 (R) < P out (R). The operational significance 
of the concept of a "Jensen outage" will be established at the end of this section. We shall first 
focus on characterizing the Jensen outage probability analytically. 

Based upon (PT5T) . one can show that the Jensen channel can be factored as U = 7iu,(R T/2 <8> 
I M ), where R = R H , if M R < M T , and R = Rg, if M R > M T , and 7U, is the i.i.d. CW(0, 1) 
matrix with the same dimensions as H. and given by 

[H,,o ••• if M R < M T , 

74 = { (32) 
[[H^ ••• HS^.J, ifM R >M T . 

Here, H W)n denotes i.i.d. CJ\f(Q, 1) matrices of dimension M R x M T . Using 74U ~ 74, for 
any unitary matrix U, and A n (R H ) = A n (Rjjj) for all n, we get TCTL H ~ 7{ V1 (A ® Iu)'H V} , 
where A = diag{Ai(Rn), ■ • • , A p (Re), 0, . . . , 0} and we have defined p = rank(Re). We 
therefore have 

J(SNR) ~ logdet ^1 + ^"^(A ® I M ) 

Next, observe that the following positive semidefinite ordering holds 

Ai(R H ) diag{I pM , 0} r< A <g> I M r< A P (R H ) diag{I pM , 0} . (33) 

Since, as already noted, logdet (A) is increasing on the cone of positive definite matrices A 
[17, p. Ill], we get the following bounds on the Jensen outage probability 
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F[logdet[I + X p (R m ) Tir ^7i w H w ) <R 



SNR 77^ 

n n 

M T iV 



< Pj(R) (34) 

<p(logdet(l + Ai(R H )-^^^J <R 

where TL W = ?i u ,([l : m], [1 : pM}). By the same line of reasoning as in (|2"Tj) . taking the expo- 
nential limit (in SNR) in (O yields 

Pj(i2) =p(logdet(l + SNR?C?<f)< i?). (35) 
The high-SNR asymptotics of Pj(R) can be expressed in terms of the singularity levels of the 
Jensen channel. Specifically, define a. = [a± ■ ■ ■ a m ] T , where the singularity levels are given 

by --» 

togVH.TO k= m (36) 
log SNR 

or, equivalently, X k (Hu/H^) = SNR~ ak . Letting the data rate scale as P(SNR) = r log SNR, it 
can be shown [1, Th. 4] that 

Pj(rlogSNR) = F(J r ) (37) 

where 

J r = |a e R™: «i>a 2 >--->a m ,^[l - a fc ] + < rj . 
The corresponding SNR exponent is defined as 

dj{r) = — lim 



SNR^oo log SNR ' 

Based on (1351) . it follows immediately that d s (r) is nothing but the DM tradeoff curve of an 
effective MEMO channel with pM transmit and m receive antennas. We can therefore invoke 
[1, Th. 2] to infer that the Jensen DM tradeoff curve is the piecewise linear function connecting 
the points (r, d a {r)) for r = 0, . . . , m, with 

dj(r) = (pM-r)(m-r). (38) 

Since, as already noted, Pj(R) < P oa t(R), it follows that F(J r ) <¥(O r ). Moreover, by the 
outage bound [1, Lemma 5], we also get d*(r) < d (r). Hence, in summary, we have 

d(C r ) < d\r) < d a (r) <dj(r), rE [0, m], (39) 

for any family of codes C r . The optimal DM tradeoff curve d*(r) will be obtained in the next 
section by deriving a sufficient condition on C r to guarantee that d(C r ) = d a {r) and hence 
necessarily d*(r) = d a (r). 
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IV. Jensen-optimal code design criterion 

The goal of this section is to provide a sufficient condition on a family of codes C r to have 
d(C r ) = dj(r). By virtue of (|39| ), this then proves that the optimal DM tradeoff is given by 
dj(r) and establishes a design criterion for DM tradeoff optimal codes. Corresponding code 
constructions are provided in Section PVT. 

A. Code design criterion 

In what follows, for any family of codes C r , we shall refer to the N x N matrix Rjjj E^E, 
where E = X — X' and X, X' E C r (SNR), as the effective codeword difference matrix. Because 
the codeword difference matrix E depends on SNR (see Sec. IIII-A|) . so does R^J E H E and 
any function thereof. In particular, we shall make the SNR-dependency of the eigenvalues of 
E^E explicit by introducing the notation 

X fc (SNR) = A fe (R^ E"E), k = 1, . . . ,pM T (40) 

where A* (SNR) < X 2 (SNR) < • • < ?i p m t (SNR) for all SNRs. 

The following two remarks are in order. First, we note that the remaining N — pM T eigen- 
values of Rg E H E are identically equal to zero for any effective codeword difference ma- 
trix arising from C r (SNR) and for any SNR. This observation follows from rank(A B) < 
rank(A) rank(B), where A and B are positive semidefinite matrices of equal dimensions [18, 
p. 458]. Since rank(R e ) = p and rank(E H E) < min(M T , N) = M T (recall that N > pM T ), 
we have rank(R^ © E H E) < pM T , for all E = X - X, X, X e C r (SNR) and all SNRs. In 
the sequel, we shall refer to the eigenvalues that are not identically equal to zero for all SNR 
values as nonzero eigenvalues. 

Second, it is important to note that the eigenvalues Xfc(SNR), k — 1, . . . , pMx, are bounded 
above by a constant independent of SNR. To see this, note that 

Vi T (SNR) <Tr (RS0E fl E) 

= <4Tr (E"E) (41) 
< AalM T N (42) 

where (|4T|) is a consequence of the fact that the variance of the fading coefficients is crjj, i.e., 
the diagonal entries of Re are all given by cr^, and (l42l) follows from (|28|) and E = X — X'. 
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Now, (|42|) is exponentially equal to SNR° = 1, which, combined with the ordering imposed on 
the eigenvalues, shows that 

A*(SNR)<1, fc = l,...,pM T . (43) 
We are now ready to present one of our main results. 

Theorem 1: Consider a family of codes C r with block length N > pM T that operates over 
the channel (TT2l) . For any effective codeword difference matrix, let its eigenvalues be given as 
in (|40|) . and define 

m 

EC -(SNR) = J] ^ SNR ) C 44 ) 

X,X' GC r '(SNR) fc=1 

where the superscript pM T in (SN R) emphasizes the fact that there are exactly pM T nonzero 
eigenvalues. If C r is such that 

S^ MT (SNR)>SNR~ (r - e) (45) 
for somee > that is constant w.r.t. SNR andr, then the corresponding error probability satisfies 

P e {C r ) = SNR- dj(r) . 

Proof: Appendix H 

As a direct consequence of Theorem [U a family of codes C r that satisfies (l45l) realizes a DM 
tradeoff curve d(C r ) = dj{r) and hence, by (l39l) , we obtain 

cf(r)=^(r). (46) 

The optimal DM tradeoff curve for selective-fading MIMO channels is therefore given by the 
DM tradeoff curve of the associated Jensen channel. Put differently, Theorem[T]shows that, even 
though J r C O r by definition, we still have 

F(j r ) = p(a) 

which essentially says that the "original" channel has the same high-SNR outage behavior as 
its associated Jensen channel. To complete the picture, it remains to show that families of codes 
satisfying the design criterion (l45l) indeed exist. This will be done in Section |V] by providing 
systematic DM tradeoff optimal code constructions. 
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B. Interpretation of the code design criterion 

We shall next discuss the relation of the code design criterion (I43T) to results available in the 
literature. 

Non-vanishing determinant criterion and approximate universality: The non- vanishing deter- 
minant criterion [2], [3], which is well-known for flat-fading MIMO channels, can be recovered 
from the code design criterion in Theorem Q] as follows. In the flat-fading case, the channel 
covariance matrix satisfies Rh = Ijv with p — 1, and we hence have © E^E = E H E for 
all possible E = X — X'. It follows that the quantity defined in <|44|) specializes to 



(SNR) = min TT A fc (EE H ). (47) 

E=X-X',X^X' - LJ - 
X,X'eC r (SNR) k=1 



For Mt < Mr, we have 



H™(SNR) = min det(EE H ) 

E=X-X',X^X' 
X,X' eC r (SNR) 

and condition (|45T) simply requires that det(EE^) > SNR~ (r ~ e) , e > 0, for all codeword dif- 
ference matrices E. Letting X = V SNR r ^ m X and E = X — X, it can be readily seen that 
condition (03]) is equivalent to det(EE H ) > SNR e . By taking e — > 0, we get that det(EE H ) must 
be non-vanishing for increasing SNRs (and hence increasing data rates i?(SNR)). Examples of 
code constructions that satisfy the non-vanishing determinant criterion, and which are hence 
DM tradeoff optimal over i.i.d. Rayleigh flat-fading MIMO channels, can be found in [2]-[6]. 

The code design criterion of Theorem Q] also encompasses the approximate universality cri- 
terion in [8] for flat-fading MIMO channels. This can be seen by specializing (1431) to the case 
p = 1, i.e., 

H^ T (SNR) > SNR- (r - £) , e > (48) 

and comparing (l48~l) to the criterion given in [8, Theorem 3.1]. The coincidence of the approx- 
imate universality criterion and (1431) (in flat fading) is noteworthy as the criteria are arrived at 
using completely different assumptions and different corresponding proof techniques: While 
our result is based on explicit assumptions on the channel fading statistics, the approximate uni- 
versality condition guarantees DM tradeoff optimal performance for every fading distribution, 
over any channel that is not in outage. 

Relation to classical space-time code design criteria: Next, we specialize our code design 
criterion to multiplexing rate r = 0, i.e., the data rate is fixed and does not increase with SNR, 
in which case the same codebook can be used for all SNR values. Note that this implies that 



July 14, 2009 



DRAFT 



10 

the eigenvalues in (l40l) are no longer functions of SNR. From Theorem [T] it follows that the 
codebook is DM tradeoff optimal if it satisfies S^ Mt (SNR) > SNR e , e > 0, or, equivalently, if 
every effective codeword difference matrix R^j E H E has pMx nonzero eigenvalues. This is 
to say that the sufficient condition for DM tradeoff optimality at r = can be stated as 

rank(R^ © E^E) = pM T , VE = X - X', X ^ X', X, X G C r . (49) 

This is precisely the code design criterion found in the SISO case in [19] using the same channel 
model as here and in [20] in the context of MEVIO-OFDM modulation. 

C. Geometric interpretation of the optimal DM tradeoff 

In the following, we provide a geometric interpretation of the optimal DM tradeoff. The 
discussion follows closely the corresponding analysis for the flat-fading case reported in [1]. 
To simplify the exposition, we consider the case of OFDM modulation over ISI channels and 
start by noting that in an OFDM system with N tones the I/O-relation (after discarding the 
cyclic prefix at the receiver) is given by (fT2j) with 

L-l 

H.„ = ^H(/)e-^ (50) 

1=0 

where H(Z), I = 0, . . . , L— 1, denotes thei.i.d. matrix- valued channel taps withCA/"(0, 1) entries. 
The corresponding mutual information (|23b can thus be written as 

I(SNR) = llogdet(l + ^D H Dg) 

where we recall that Dh = diagjHnj^TQ. Following the geometric argument in the flat-fading 
case [1], we wish to relate the outage probability at multiplexing rate r to the rank of the matrix 
D H . Unfortunately, rank(D H ) is difficult to characterize, in general, because the corresponding 
diagonal blocks are correlated due to (l50l) . In an OFDM system, the matrix DhD^ can, however, 
readily be shown to be unitarily equivalent to CrC^, where Cr is the following NMr x jVMt 
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block-circulant matrix 

H(L-l) ■•• H(l) 

: 

; ; •-. h(l-i) 



h(o) o ; 
h(i) ■•• ■■• ; 

! 
H(L-l) ••• H(l) H(0) 

For N > L (which is satisfied in any OFDM system), the structure of Ch implies that its rank 
is completely determined by the rank of its first M T columns in the case M T < M R and by the 
rank of its last M R rows in the case M R < M T . More specifically, rank(Cn) satisfies (for every 
channel realization) 

rank(C H ) = Mrank(C w ) (51) 

where 

| C H ( [1 : LM R ] , [1 : M T ] ) T , if M T < M R 

I Cu([(N- l)M R + l:iVM R ],[(iV-L)M T + l:iVM T ]), ifM T > M R . 

Note that C w is an mxLM matrix with i.i.d. CJ\f(0, 1) entries and that it is equal in distribution to 
7i w (cf. (11051) and (11081) ) obtained from the Jensen channel. In order to characterize rank(Cn), 
it follows from (15D) that it suffices to characterize rank(C^). In particular, following [1], we 
shall be interested in determining the number of parameters required to specify a matrix Ch 
of rank Nr, or, equivalently, a matrix C w of rank r. This number is obtained as follows: LMr 
parameters are required to specify r linearly independent rows in C w . The remaining m — r 
rows are then given by linear combinations of these r linearly independent rows. Specifying 
these linearly dependent rows requires r parameters per row (i.e., the coefficients in the linear 
combinations of the r linearly independent rows) and hence (m — r)r parameters overall. The 
total number of parameters specifying a matrix Ch of rank Nr is therefore obtained as 

LMr + (m - r)r = LMm - (LM - r)(m - r). (52) 

Now, following the reasoning in [1, Sec. 3.2], we can conclude that an outage at multiplexing 
rate r occurs when C w is close to the manifold of all rank-r matrices. This requires a collapse 



H(0) 

H(l) H(0) 

; h(i) 

h(l-i) ; 

o h(l-i; 

: 
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in the components of C w in all the dimensions^ orthogonal to that subspace; the number of such 
dimensions is given by (LM — r)(m — r), which is precisely the SNR exponent given in (1381) 
and hence concludes the argument. 



D. Particularizing the design criterion $49v to ISI channels 

Condition (l49l) can be stated in a form that yields geometric insight into the code design 
problem and nicely reveals the code design criterion reported in [20] for frequency- selective 
MIMO channels as a special case. We start by stating the following result in full generality and 
will then specialize it to the case of ISI channels. 

Proposition 1: Let R H = Xln=o ^n.u n u^ be the eigenvalue decomposition of the channel 
covariance matrix. Then, (l49l) holds if and only if 

A= [v^DugE 1 * ••• v/vTD^E" 

has full rank. 

Proof: Based on the eigenvalue decomposition of Re, we get 
Kl © E"E = ( K<uA E^E 

\n=0 / 
p-1 



11 



(53) 



J]A n D u *E"ED Uri (54) 



n=0 

= A (55) 

where (l54l) follows from the fact that ab T C = D a CDb for any n x 1 vectors a, b and any 
n x n matrix C. The proof is concluded upon noting that rank(A) = rank(A H A). ■ 

We note that a decomposition of the effective codeword difference matrix similar to that in 
(1531) has also been reported for the SISO case in [19]. 

Specialization to the ISI channel case: We shall next specialize Proposition Q] to the ISI 
channel case, and recover the code design criterion reported in [20] for MIMO ISI channels. 
In an OFDM system, as considered in [20], the channel's covariance matrix is given by 

R H = * diag{a 2 , . . . , 0, . . . , 0} * H (56) 

where the {of} correspond to the power-delay profile that, for the sake of simplicity of exposi- 
tion, we assume to be given by af = 1, for all I, throughout this section. Since Re is diagonalized 

3 We refer to [15, note on p. 397] for an argument on why it is meaningful to talk about orthogonal dimensions even though 
the manifold of all rank-r matrices is not a linear subspace. 
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by the FFT matrix St, we have D u » = D n , where D = -^diag{e- J ^" fc }^r 1 . Hence, based on 
(1561) . (1531) specializes to [20] 

Since the rank of a matrix is unaltered by left multiplication by a full-rank matrix, we can 
equivalently consider the matrix \I> T A. In particular, we note that \I> T D n E H = TP E^, where 
II = [7Ti • ■ ■ ttn-i 7r ], with 7r fc (n) = 1 for k = n and 7r fc (n) = otherwise, is the basic 
circulant permutation matrix and E t = E\I>* is a time-domain representation of the codeword 
difference matrix. The code design criterion for r = in the ISI case therefore amounts to 
ensuring that the matrix 

[n°Ef ■•• n^Ef]* (57) 

has full rank for all codeword difference matrices, which is precisely the code design criterion 
reported in [20] ,[21]. Requiring the matrix in (1571) to have full rank for all E t essentially amounts 
to saying that the code should be designed such that the receiver can separate the shifted versions 
of the transmit signal. 

Prior results on the DM tradeoff for ISI channels: We shall next specialize our results to 
frequency-selective fading MIMO channels, recovering the results reported previously in [9], 
[10]. Assuming a frequency-selective fading channel with L taps that are i.i.d. CJ\f(Q, 1) and a 
cyclic I/O-relation (as in an OFDM system), the covariance matrix is again given by (l56l) with 
p = rank(Re) = L. Inserting p = L into (T38T) and using (|46|) yields the optimal DM tradeoff 
curve as the piecewise linear function connecting the points (r, d*(r)) for r = 0, . . . , m, with 

d*(r) = (LM-r)(m-r). (58) 

This is the optimal DM tradeoff curve for frequency- selective fading MIMO channels reported 
previously in [10]. Specializing (l58l) to the single- antenna case M T = M R = 1 and noting that 
d*{r) = (L — r)(l — r) = L(l — r) for r = 0, 1, recovers the result reported in [9]. We note 
that the proof techniques employed in [9], [10] are different from the approach taken in this 
paper and seem to be tailored to the frequency-selective case. In addition, since Theorem [T] only 
requires N > LM T , our result is not limited to large block lengths as required in [9], [10]. 

Finally, we note that the achievable DM tradeoff curve reported in [1] for the case where 
coding is performed across L independent MIMO channels is given by 

di(r) = L(M — r)(m — r). 
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We clearly have di(r) < d*(r) for all multiplexing rates and all possible values of m and M. 

The case of linear convolution: For linear convolution, as encountered in single-carrier mod- 
ulation, the code design criterion for r = is obtained by replacing II in (l57l) by the forward 
shift matrix [18] and ensuring that the resulting matrix has full rank for all codeword difference 
matrices. To see this, consider the following I/O-relation 



SNR 



L-l 



Mi 



^H([)x[n-i] + z[n] 



(59) 



1=0 



where y [n], x[n], and z[n] denote the received, transmitted, and noise vector sequences, respec- 
tively. We assume that x[n] = for n < and n > N — L, and consider the time interval 
n = 0, . . . , N — 1. Stacking the received signal vectors according to Y = [y[0] ■ ■ ■ y[N — 1]] 
and the channel taps as H = [H(0) • ■ ■ H(L — 1)], the resulting I/O-relation can be written as 



SNR 



Mn 



HX 



(60) 



where Z = [z[0] • ■ • z[N — 1]] and the LM T x N transmit signal matrix is given by 



X 



x[0] x[l] •■■ x[N-L] 
x[0] x[l] ■■• x[N-L] 











■■• x[0] x[l] ••■ x[N-L) 

Consequently, any codeword difference matrix £ = X — X' has the structure 

£= [S°E H ■■■ S^E^]^ 



(61) 



where S denotes the forward shift matrix and, here, E = [e[0] ■ ■ • e[7V — L + 1] • • • 0] with 
e[n] = x[n] — x'[n]. Comparing (I6TT) with (1571) shows that the code design criterion follows 
from (1571) by replacing the cyclic shifts by linear shifts, and ensuring full-rank of the resulting 
codeword difference matrices [20]. 



E. Block-fading channels 

In the block-fading channel model, the channel remains unchanged during a block of say L 
time slots and changes in a statistically independent fashion across blocks. We consider B such 
independent blocks for which the I/O-relation (fT2~l) holds with N = BL and 

+ l) , n = 0, . . . , N - 1 



H„ = H 



7? 
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where H(6), b = 1, . . . , B, denotes the channel matrix with i.i.d. CJ\f(Q, 1) entries corresponding 
to the 6th block. The BL x BL channel co variance matrix Rbf is therefore given by 

R BF = I B ®1 L 

with rank(Rap) = B. The corresponding Jensen DM tradeoff curve is the piecewise linear 
function connecting the points (r, dj{r)) for r = 0, . . . , m, where dj(r) = (BM — r)(m — r). 

Theorem Q] provides a sufficient condition for a family of codes C r with block length N > 
BMt to achieve the optimal DM tradeoff curve. In the block-fading case, every codeword X e 
C r (SNR) can be partitioned into B blocks of size M R x L according to X = [Xi • • ■ X B ] and, 
similarly, any codeword difference matrix E = X — X' can be represented as E = [E x ■ ■ ■ E#], 
where E& = X& — ~X! b , for b = 1, . . . , B, has dimension Mr x L. Consequently, the effective 
codeword difference matrices have the following structure: 

R5 F 0E H E = diag{EfE f) }^ =1 

and the corresponding code design criterion follows from (l45l) as 

m 

JjA fc (Ri F 0E^E)>SNR-^ (62) 

k=l 

for all possible codeword difference matrices E arising from C r (SNR), and some e > constant 
w.r.t. SNR and r. We note that the block diagonal structure of the effective codeword difference 
matrices implies that 

{ Ax {Rl F © E H E) , . . . , A BMt (Rl F © E"E) , (^^0 } 

B 

= |J {Ai(Ef E 6 ), . . . , A MT (Ef E 6 ), 1 __0 }. (63) 

b=1 L-M T 

In the absence of coding across individual blocks, that is, if the codewords are designed so that 
they satisfy the following per-block criteria obtained from (1451) 

II Az(Ef E 6 ) > SNR-^\ e > 0, for b = 1, . . . , B, (64) 

i=i 

the design criterion (|62|) is not guaranteed to be satisfied because the m smallest nonzerdz 
eigenvalues of R^p E^E are, in general, not equal to the m smallest nonzero eigenvalues of 
E^E 6 , for some V G {1, . . . , B}. We can therefore conclude that having the individual blocks 

4 Recall that "nonzero eigenvalue" refers to an eigenvalue that is not identically equal to zero for all SNR values. 



July 14, 2009 



DRAFT 



E b satisfy (|64|) is, in general, not sufficient to ensure DM tradeoff optimality and coding across 
blocks is required. 

Interestingly, the situation is different for Mt = 1. In this case, we have m = 1 so that (l62l) 
is given by 



We also note that there is only one nonzero eigenvalue per block, and the per-block design 
criterion in (|64|) now reads 



Since \ 1 (R% F &E H E) = Ai(E^E 6 /) for some b' e {1, . .., B}, we can conclude that satisfying 
(l66l) for all blocks guarantees that (1651) is also satisfied. 



We established the optimal DM tradeoff for the general class of selective-fading channels and 
provided a code design criterion for achieving DM tradeoff optimality. The goal of this section is 
to demonstrate the existence of codes satisfying this design criterion and to provide correspond- 
ing systematic design procedures. In addition, we want to ensure that the proposed DM tradeoff 
optimal code designs are practicable in the sense of being independent of the channel covariance 
matrix (i.e., of the selectivity characteristics). We shall see that in the single transmit antenna 
case this is rather straightforward to accomplish. In the case of multiple transmit antennas, we 
propose a procedure that decouples the problem into the design of a precoder (which can be 
obtained systematically for a given R H ) and an outer code which has to satisfy a design criterion 
that is independent of R H . 

A. The single transmit antenna case 

Consider the case M T = 1 and M R general with a corresponding family of codes C r of block 
length N. The codewords in C r are 1 x N vectors of the form x = [xq ■ ■ ■ xn-i] with the 
corresponding effective codeword difference matrices given by 



Ai(R^ F E^E) > SNFT (r - e) , e > 0. 



(65) 



Ai(Ef E 6 ) > SNR- (r " e) , e > 0, for b = 1, . . . , B. 



(66) 



V. Code design for optimal DM tradeoff 



R^j e H e = Df R^D e 



(67) 



so that E^(SNR) defined in (144)) specializes to 



2?(SNR) 



e=x— x',x^x' 
x,x' eC r (SNR) 



min Ai(Df RjjjDe). 

— ~v v— fir 



(68) 
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The dependency of (|68|) on Re leads to different code design criteria depending on the channel 
selectivity characteristics. For example, in a flat-fading channel, where Re = Ijv, p = 1, and 
Re G H e = e H e, we have H;[(SNR) = min e ^o || e l| 2 - On the other hand, in the fast-fading 
case where Re = I and hence p — N,it follows from (1681) that 

Sf (SNR) = min \e n \ 2 . 

e^O 

n=0,...,N-l 

We shall next provide a code design criterion which guarantees DM tradeoff optimality irre- 
spectively of Re- 

Proposition 2: The family of codes C r is DM tradeoff optimal for M T = 1 if it satisfies 

min min|e n | 2 >SNR~ (r ~ e) (69) 

e=x— x',x^x' n 
x,x' GC r (SNR) 

for some e > constant w.r.t. SNR and r. 

Proof: Applying Ostrowski's Theorem [18, Theorem 4.5.9] to the effective codeword 
difference matrix (1671) and using Afc(Re) = Afc(Re) yields \ n (D^'R^D e ) = 6> e A n (Re), 
n — 0, . . . , N — 1, where 9 e e [min n |e„| 2 , max n |e„| 2 ]. Hence, by ( |69| ), we have 

A fc (Df R^D e ) > SNR- (r - e) A fe (Re), fc = 0, 1, (70) 

for all e / 0. Since the eigenvalues of Re are constant w.r.t. SNR, we conclude from (TTOl) that 
S?(SNR) > SNR~ (r ~ e) , implying by d45]) that C r is DM tradeoff optimal. ■ 
Since the minimum distance in a QAM constellation scales as rf 2 rlin = SNR _r [15, Sec. 9.1.2], 
using uncoded QAM constellations with SNR r points in each slot n — 0, . . . , N — 1 satisfies 
(l69l) for e — > 0. We can therefore conclude from Proposition [2] that in the single transmit antenna 
case uncoded QAM is DM tradeoff optimal irrespectively of Re- 

B. Multiple transmit antennas 

For multiple transmit antennas, the situation is more complicated. We next describe a proce- 
dure that decouples the problem of designing DM tradeoff optimal codes for multiple transmit 
antennas into the design of a precoder depending on R H and an outer code which has to satisfy 
a design criterion that is independent of Re- Specifically, we shall see that the precoder can 
be chosen such that the criterion to be satisfied by the outer code boils down to a criterion 
well-known in the literature with corresponding optimal code designs available. 
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We consider families (w.r.t. SNR) of codes of block length N for which the M T x iV codeword 
matrices are given by 

X = P0X. (71) 

The matrix P can be thought of as an inner code, or precoder, and X can be interpreted as a 
codeword matrix belonging to an outer family of codes C r . In what follows, we shall refer to C r 
simply as a family of codes. 

If X, X' G C r (SNR), the corresponding precoded codeword difference matrix is given by 
E = P E, where E = X — X'. With the rows of E and P denoted as ey) and p^), respectively, 
we have 

M T 

E^E = ^pJ )P(z) 0eJ ) e (O . 
1=1 

Defining 

R, = Dj tt) Ri;D P(0 , Z = 1,...,M T (72) 

and using R z e^e^ = D^ ;) RD e{;) (/ = 1, . . . , M T ), the effective codeword difference 
matrix is given by 

Mt 

RS0E"E = ^D e ^RD e(O . (73) 

i=i 

Consequently, the code design criterion in Theorem \T\ specializes to 



'Mi 



Sfi^SNR) = F min, U X 4H D ? ;) ** D e (0 > SNR-<~> (74) 



X,X' eC r (SNR) k=1 \ l=1 



E=X-X',X^X' 
X,X' eC r (SNR) 

for some e > constant w.r.t. SNR and r. We shall next formalize our main result in the context 
of code design for selective-fading MIMO channels. 

Theorem 2: Consider a family of codes C r ,r G [0, m], of block length N > pMx- Let the 
transmit signal corresponding to antenna I, for / = 1, . . . , Mt, be given by x = pp) x, where 
x = [xq ■ ■ ■ arjv-i] is a codeword in C r (SNR) and p^) is the /th row of the precoding matrix P 
(Mt x N). If, for some e > constant w.r.t. SNR and r, C r satisfies 

m— 1 

min TT K (n) | 2 >SNR-^ (75) 

e=x-x',x/x' 
x,x'eC r (SNR) n=0 

where n is the (SNR-dependent) permutation that sorts the entries of e in ascending order for 
every SNR levefl and P is such that 

rank(R^ P H P) = pM T (76) 

5 Recall that the entries of e depend on SNR. 
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then the pair of inner and outer codes (P, C r ) satisfies the code design criterion (l45l) in Theorem 

CD 

Proof: We start by noting that since the same 1 x N codeword x is transmitted over all 
antennas, we have = e, for all / = 1, ... , M T , which, upon inserting into d73l . yields 

Mt 

RS E"E = ]T D^R,D e(i) = Df (Hj P*P) D e . (77) 

Condition (1761) implies that exactly pM T eigenvalues of R^0P H P are nonzero (recall that N > 
pM T so that rank(R^ © P H P) < min(iV, pM T ) = pM T is not limited by the block length N). 
With the eigenvalue decomposition R^ P H P = VEV ff , where S = diagjs, 0, . . . , j, 
£ = diag{er , . . . , cr p M T -i} an d me nonzero eigenvalues cr, sorted in ascending order, we get 
Rjjj E H E = Df VSV ff D e . Using the fact that X n (MM H ) = X n (M H M), Vn, for a square 
matrix M, we obtain 

X n (Rl E"E) = X n (^ 2 V^D e Df V S 1 ^) 

V v ' 

= A„(£ 1/2 BE 1/2 ) (78) 
> A n (B) (79) 

for the nonzero eigenvalues of R^ © E H E, i.e., for n = 0, . . . , pMx — 1. Here, B = B([l : 
pM T ], [1 : pM T ]) and (1791 follows by applying Ostrowski's Theorem [18, Theorem 4.5.9]. Since 
B is Hermitian and B is its principal submatrix obtained by deleting the N — pM^ last rows 
and the corresponding columns in B, we can invoke [18, Theorem 4.3.15] to conclude that 

A fc (B) > A fc (B) = \e n(k) \ 2 , fc = 0,...,pM T -l (80) 

where n is the (SNR-dependent) permutation that sorts the entries of e in ascending order for 
every SNR value. Next, combining (1791) with (|8~Q|) . we find that the nonzerqj eigenvalues of 
RS E^E satisfy 

Afc(R£©E H E) >cr | e7r(fc) | 2 , fc = 0,...,pM T -l. (81) 

6 Recall that "nonzero eigenvalue" refers to an eigenvalue that is not identically equal to zero for all SNR values. 
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By (|75|) . we can therefore conclude that 



m-l 

T (SNR) = J] A ^ © E^E) 

X,X' e C r '(SNR) fc=0 



>( ff0 ) m min ■ 

e=x— x ,x^=x 



m— 1 

2 



e 



7r(n) I 



x,x'GC r (SNR) n=0 



>SNR- (r - e) . 

■ 

The precoder P effectively decorrelates the channel into its independent diversity branches; 
the resulting design criterion for the outer family of codes (|75T) is satisfied by the QAM-based 
permutation codes proposed in [8] in the context of parallel channels. To see this, we start by 
recalling that the problem addressed in [8, Sec.V.B] is the construction of space-only codes, 
i.e., N = 1, that are approximately universal over a parallel channel with L independent flat- 
fading subchannels. The code construction presented in [8] is based on permutations of QAM 
constellations. In order to sustain a rate of i?(SNR) over the parallel channel, each subchannel 
has as input alphabet a QAM constellation *4.(SNR) with 2 i? ( SNR ) points. A permutation code 
across the L subchannels can be represented as 

n(SNR) = {x=[7n(g) ... ir L (q)],qeA{SNR)} (82) 

where A is the family of QAM constellations defined in (l29l) and the m, I — 1, . . . , L, are permu- 
tations of the constellation elements in ^4(SNR). A remarkable result given in [8, Theorem 5.2] 
says that there exist permutations 717, 1 = 1, . . . , L, so that II in (l82l) constitutes an approximately 
universal code for the parallel channel. By [8, Theorem 5.1], such a family of codes II satisfies 
the following condition. Let x denote a codeword in II(SNR) as defined in (f82l) . and denote the 
corresponding codeword difference vectors by e = x — x', x ^ x', x, x' E II(SNR). Then, the 
approximately universal family of codes II satisfies [8, Eq. (24)], i.e., 

I e (!)| 2 ■ ■ ■ l e ^| 2 > 2 *(s N R)L g SNR = SNR " (r " £) (83) 

for all e ^ arising from II(SNR) and some e > that is constant w.r.t. SNR and r. 

Mapping the spatial dimension in (1831) to time-frequency slots and setting L = N, it follows 
from [8, Th. 5.2] and (f83l) that there exist families of permutation codes II as given in (f82l) (now 
Ti n (q),n — 0, . . . , N — 1, denotes the symbol transmitted in time-frequency slot n) that satisfy 

|e(l)| 2 ---|e(A0| 2 >SNR- (r - e) (84) 
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for all e ^ arising from n(SNR) and some e > constant w.r.t. SNR and r. Due to the 
power constraint (|28T) on the codewords of C r , we necessarily have |e(n)| 2 < 1 for all n so 
that (1751) is satisfied. We can therefore conclude that the design criterion in Theorem [2] for the 
family of codes C r can be satisfied using the QAM -based permutation codes proposed in [8]. 
We emphasize, however, that here coding is performed over time and frequency as opposed to 
[8] where coding is performed across parallel channels. 

VI. Precoder design 
It remains to show that, given R H , we can find a precoder P such that 

rank(R^ P^P) = pM T . (85) 

Using the eigenvalue decomposition R H = J2n=]i AnU^u^ , we note that 

/p-l \ /M T 

rI © p h p = ij2 A « « E pf)Pw 

\n=0 / \ 1=1 

p-l M T 

= EE A «^<^^- (86) 

The task of designing a precoder that satisfies (l8~5l) amounts to finding p^, I — 1, . . . , M T , such 
that the corresponding o; n> ; are linearly independent. Enforcing structure in R H allows to get 
more specific about how to design the precoder. This can be illustrated as follows. 

Example: Consider the case of cyclic ISI channels (e.g., OFDM modulation) with Mt = 2, 
L — 2, and N — 4. Using (l56l) the corresponding covariance matrix is obtained as Re = 
Xoipo^Q + Xi^i^jy , where the eigenvectors of Re are simply columns of the FFT matrix 
* = [-00 "01 "02 03] , i-e., u n = n = 0, . . . , 3. One possibility to obtain a set of linearly 
independent vectors cx. n j in (|8~6l) is to set 

P(0 = ^ = 1,2. (87) 

More concretely, invoking 

D^ m n = 0(n+m)modA r 

the precoder defined through (1871) results in 

R^ P^P = D * (A Vo< + AilWD 

+ Dj 2 (A o o *Vo r + Ai^tO D V> 2 
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diag{A ,A 1 ,A ,A 1 }* 



T 



which is clearly a full-rank matrix. Note that this precoder simply amounts to performing (cylic) 
delay diversity. 

We next consider general time-frequency selective channels where the corresponding covari- 
ance matrix R H — as a consequence of the stationarity of L^(t, f) in t and / — is two-level 
ToeplitzQ. In this case, it seems difficult to devise a general analytic procedure for constructing 
P for a given Re such that (1851) is satisfied. We can, however, exploit the asymptotic equivalence 
of two-level Toeplitz and two-level circulant matrices to satisfy (f85l) asymptotically in the block 
length N. In particular, we will need the following result. 

Theorem 3 (Asymptotic Eigenvalue Distribution [22]-[24]): The distribution of the eigen- 
values of Re for M, K — > oo, where M and K are related to the block length N by the mapping 
(fTTI) . is given by 



S{tti = E E MmT,kF)e-^ 



(/im—^k) 



m=—oo k=— oo 

oo oo 



TF 

l= — 0O J = — 0O 



In what follows, we design the precoder P based on a (two-level) circulant approximation 
Ch of the (two-level) Toeplitz covariance matrix Re- Specifically, we take the matrix Ce such 
that its eigenvalues are uniformly-spaced samples of the asymptotic eigenvalue distribution of 
Re given by S(^,/jl). This implies that C H and R H are asymptotically (in block length N) 
equivalent JT22, Lemma 11], [23, Lemma 1] and that their eigenvalues are asymptotically equally 
distributee^ [22, Theorem 9], [23, Theorem 1]. In cases where the signal model is (two-level) 
circulant [14], [16], this approach gives exact results for any block length N because Rh is 
(two-level) circulant for any K and M. For general (two-level) Toeplitz covariance matrices 

7 A two-level Toeplitz matrix is a block Toeplitz matrix with Toeplitz blocks. Similarly, a two-level circulant matrix is a block 
circulant matrix with circulant blocks. 

8 The interested reader is referred to [22, Theorem 4] (respectively, [23, Theorem 2]) for a formal definition of the concept 
of asymptotically equally distributed one-dimensional (or two-dimensional) sequences. 
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Re, this approach is meaningful because the asymptotic equivalence of Ch and Re implies 
asymptotic equivalence of P H P and Rjjj P^P. 
We start by defining the (two-level) circulant matrix 



H 



C H = FAF 

where F = \I> <E>, with ^Sf and <E> denoting the M x M and K x K FFT matrices, respectively, 
and A = diag{A n (C H )}^ = T 1 , with 



A n (Cj 



r k m\ 



1 K'Af) 



m = 0, 



M - 1, k = 0, 



1 



(88) 



where we have used the mapping n = M. (m, k) defined in (ITTb . Because the scattering function 
is assumed to be compactly supported in the rectangle [0, t ] x [0, u ], S(£, p) is also compactly 
supported, and hence the nonzero eigenvalues of Re in (1881) are indexed by 



where 



(m, k) G {0, . . . , v — 1} x {0, ...,£ — 1} 



L^™j and t = [t FK\ . 



(89) 



(90) 



Next, we propose a precoder tailored to Ch that achieves rank(C|jj P^P) = pM T . The 
main idea underlying this construction is to design P such that the precoder effectively induces 
time-frequency shifts with the shifts chosen appropriately. 

Proposition 3: Consider the N x N matrix Ch = FAF^, where F = * $ (\I>, <fr are the 
M x M and K x K FFT matrices, respectively) and A has p = vt nonzero diagonal elements. 
If N > pM T and P satisfies 

P(0 = VV®<^V> for/ = 1,...,M T (91) 
where ip m and 4>k are, respectively, the mth and A;th columns of ^ and and 



(PhQi) e S 0, 



u T 



1x0,.., 



r F 



- l\, {p hqi ) ^ ip v , qil ) for I (92) 



then rank(CS P^P) = pM T . 

Proof: We start by noting that C^j P H P can be written as 



Cl P^P = D p£ } C l d p« 



(93) 



i=i 



— 



Next, consider the following similarity transformation 



F T QF* = F T D„« F*AF T D n ,,F* 



CO 



P(0 



(94) 
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where we have used C e = FAF H . With 491} and F = * <g> <fr, we get 



= W v <g> IF if (95) 

where II = [7Ti ■ • • 7Tat_i 7To], with 7r n = [0 • ■ ■ 1 • • ■ 0] T containing a 1 in its nth position, 
is the circulant permutation matrix. Using <|95T ) in ([94]). we obtain 



f t qf* = (n^ ® n 91 ') A(n MU ® n 9i *) T (96) 

and consequently 

M T 

f t (c£ p h p) f* = ^ (n piV ® n 9! *) a (if 1 " ® n 9i ') T . (97) 

z=i 

Since (n fc (g) II') A(n fe ® II') T simply permutes the entries of A along the main diagonal, the 
rank of C^J P^P is trivially bounded above by pM T . To achieve this maximum rank, we 
need to ensure that the different shifts in (1971) distribute the p eigenvalues of Ce into mutually 
orthogonal subspaces. This can be accomplished as follows. With (l89l) and (l96l) . we find that 
the indices (m, k) corresponding to the nonzero eigenvalues of Q are given by the set 

Zi = {piv, ...,(pi + l)v - 1} x fat, ...,(qi + l)t- 1} 

that is, the nonzero eigenvalues of Q are obtained by cyclically shifting the eigenvalues of Ce 
by piv positions along index m and q\t positions along index k. The condition in (l92l) guarantees 
that J; fl Ty = for / 7^ V, which together with p = vt in turn ensures that rank(Cjjj © P^P) = 
pM T . 

■ 

We finally note that the precoder described in Proposition [3] is a generalization of well-known 
transmit diversity techniques that convert spatial diversity into time or frequency diversity [25]- 
[27] . This can be seen as follows. From (19TI) . we note that the precoder P amounts to multiplying 
the signal transmitted from the /th antenna by 

p (I) (n) = exp \-j27r ( Pl v^ + qi t^j Y f or n = 0, . . . , N - 1 (98) 

where the pair (m, k) is related to the slot index n by Ai(m, k) = n. For K = 1 (and hence 
k = 0, and N = M in (1981)). the index n = m runs over time, resulting in 

P(t)(n)=expl-j—pivj, forn = 0, ...,M- 1 (99) 
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which shows that the precoder simply introduces a frequency offset across transmit antennas — a 
technique known as phase rolling [27]-[32]. On the other hand, for M = 1 (and hence m = 0, 
and N = K in (1981)). the index n = k runs over frequency and we obtain 



which shows that the precoder induces a time offset, i.e., a delay, across transmit antennas and 
hence corresponds to delay diversity as proposed in [25], [26], [31], [32]. In the case of general 
M and K, the precoder in (1981) induces time and frequency shifts. While delay diversity and 
phase rolling are well-known and easy-to-implement transmit diversity techniques for MISO 
systems that have been shown to have the potential of realizing full diversity gain for r = 0, 
it is surprising to see that they result in DM tradeoff optimality (when combined with proper 
outer codes) for multiplexing rates greater than zero. 



Analyzing the high-SNR outage behavior of the Jensen channel instead of the original channel 
was found to be an effective tool for establishing the optimal DM tradeoff in general selective- 
fading MIMO channels. Our achievability proof reveals a code design criterion for DM tradeoff 
optimality based on which it is shown that the code design problem can be solved in a systematic 
fashion by combining a precoder adapted to the channel statistics with an outer code that is DM 
tradeoff optimal for parallel fading channels. The main result of the paper is supported by an 
appealing geometric argument, first provided in the flat-fading case in [1]. Finally, we note that 
the concepts introduced in this paper can be extended to multiple-access selective-fading MIMO 
channels [33] and to the analysis of the DM tradeoff properties of relay channels [34]. 



We start by deriving an upper bound on the average (w.r.t. the random channel) pairwise error 
probability (PEP). Assuming that X = [x • ■ • Xjv-i] was transmitted, the probability of the ML 
decoder mistakenly deciding in favor of codeword X' = [x' • • • x' Ar _ 1 ] can be upper-bounded 
in terms of the codeword difference matrix E = [e ■ ■ • ejy-i] with e n = x n — x' n as 




(100) 



VII. Conclusion 



Appendix I 



Proof of Theorem CD 




(101) 
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where (1101b is the Chernoff bound on the PEP, H,,, denotes an M R x M T N i.i.d. £/V(0, 1) matrix, 
and we have introduced the matrix 

T = (R^ 2 ® I Mt ) diagjej^" 1 . (102) 

Noting that 

T^T = © E H E (103) 

and using the fact that the nonzerdj eigenvalues of T^T equal the nonzero eigenvalues of TT fl 
for every SNR, it follows, by assumption, that YT H has pM T nonzero eigenvalues denoted as 
^i(SNR) < X 2 (SNR) < • • < X pMt (SNR) (see Sec. ITV-Al Then, performing an eigenvalue 
decomposition according to YT H = UAU^, where the ^Mt x NMt matrix U is unitary and 
A = diag{A,0} with A = diag{^ fc (SNR)}^ 1 T ,wehaveTr (H^TT^H^) ~ Tr (H^AH^). 
Hence, setting H w = H W ([1:M R ], [l:pM T ]), it follows that 

P(X - X') < E Hw jexp (-^- Tr (*L W AU" } . (104) 

Next, we express the right-hand side (RHS) of (1104b in terms of the Jensen channel 7i = 
?^(R T/2 <g) I M ), where R = R H , if M R < M T , and R = Rg, if M R > M T , and is defined 
indH. 

For M R < M T , we note that H w = TL, with Hu, = Ti^Ql : M R ], [1 : pM T ]). Invoking 
Theorem |4] in Appendix UH we get 



Tr (H^AHf ) > h(H w Il* ) X MK+ i-k(SNR) 
k=i 

Mr 

= Y, x k(n w n")x MR+ i~k(SNR). (io5) 



fc=i 

For M R > M T , we set A = diag{A n }£~ , where A n = diag{A fe }^ 1 ^ 1 , to get 

Tr (U w AHf ) = ^ Tr (H w , n A n Hj n ) (106) 

n=0 

where H w = [H W) o ■ • • H w p _i] . Because the eigenvalue ordering implies A ^ A n for all 
n 7^ 0, we can invoke [18, Observation 7.7.2, Corollary 7.7.4(b)] to write Tr (H„, n A n H^ n ) > 
Tr (H^AqH^ ) for all n^0. Now (1106b can be rewritten as 



Tr {H w , n A n H» n ) > ^ (H m , n A H* n ) 

n=0 n=0 



9 We recall that "nonzero eigenvalue" refers to an eigenvalue that is not identically equal to zero for all SNR values. 
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p-1 

1/2 
L 



- VTr ( A 1/2 ti H H A 1 

n=0 

= Tr^ /2 ^H- n H lu , n j 
= Tr(A; /2 ?4,^A; /2 ) (107) 

M T 

> ^XkCKM") X MT+ i-k(SHR) (108) 

fe=i 

where we set ?4 = : M T ], [1 : pM R ]) with T-L w given by d32]) to get (fTOTT) . and (fl08l) 

follows immediately upon applying Theorem |4] in Appendix HI] to (11071) . Combining (11051) and 
(11081) . we have, for general M T and Mr, that 

m 

Tr (tLAHf) >J2 X ^(Wi^ra + l-k(Sm) 

k=l 
in 

= ^SNFr Q a m+1 _ fc (SNR) (109) 

fc=i 

where (11091) follows from the definition in (l36l) . Using (11091) in (11041) . we obtain a PEP upper 
bound in terms of the singularity levels au (k = 1 , . . . , m) characterizing the Jensen outage 

event f / \ 1 

P(X^X / )<E ce |expf- i l- ^SNR^Wi-^SNR)!!. (110) 
Next, consider a realization of the random vector ct and let S a = {k : o>k < 1}. We have 

m 

^TSNR 1 ^ X m+1 _ fe (SNR) > ]T SNR 1 ^ X m+1 „ fe (SNR) 

> \S a \ (SNR^I 1 -^^ J] X m+1 -fc(SNR) ) ' 1 (111) 
where we used the arithmetic-geometric mean inequality and 

m 

^(l-a fc ) = ^[l-« fc ] + 

fcSiS Q k=l 

is an immediate consequence of the definition of S a . Using (II 1 1|) in (II 101) . we obtain 

P(X -> X') < E f J exp ( fsNR^ 1 "^ JJ Wi-fc(SNR)j ° J 1 . (112) 



July 14, 2009 



DRAFT 



The dependency of the PEP upper bound (II 121) on the singularity levels characterizing the Jensen 
outage event suggests to split up the error probability according to 

P e (C r ) = P(error, a E J r ) + P(error, a £ J r ) 

= F(J r ) P(error|a £ J r ) + F(j r ) P(error|c* ^ J r ) 

< F(J r ) + F(j r ) P(error|a £ J r ) . (113) 

For any ex. ^ J r with r > 0, we have, by definition, ^™ =1 [1 — dk\ + > r and consequently 
\Sa\ > 1, which upon noting that |C r (SNR) | = SNR^ 1 *, yields the following union bound based 
on the PEP in (fTTH) 

\ai J T ) ^SNR^expf --L [SMR r J] X m+1 „ fc (SNR) ) ] (114) 



error 



where we used \S a \ < m. Next, we note that the code design criterion in (1451) implies that 
IlfcLi ^fc(SNR) > SNR~ (r_e) for some e > that is constant w.r.t. SNR and r. Recalling from 
(1431) that Xfc(SNR) < 1 for all k, we necessarily have 



J] X m+1 _ fc (SNR)>SNR^ (115) 

for any S a C {1, . . . , m}. Using (11151) in (I114|) . we get 

P(error, ol £ J r ) = F(j r ) P(error|o: J r ) 

<i 

£SNR " r ^(-^f)- (116) 

In contrast to the Jensen outage probability which satisfies F(J r ) = SNR - ^^, the quantity 
on the RHS of (11161) decays exponentially in SNR for any r > 0. Hence, upon inserting (11161) 
in (11131) . we obtain 

P e (C r )<F(J r ) (117) 

for r > 0. Since P(J r ) < F(O r ), it follows trivially that F(J r ) < F(O r ). In addition, for a 
specific family of codes C r , we have P(0 r ) < P e (C r ) and hence P(CV) < P e (C r ). Putting the 
pieces together, thanks to (II 171) . we obtain that for any r > 

F(O r ) < P e (C r ) <F(J r ) <F(O r ) 

which implies that 

P e (C r ) = F(J r ) = F(O r ) 
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and hence, by definition of dj(r), we get 

P e {C r ) = SNR-^ (r) . (118) 

Finally, as (II 181) holds for any r > arbitrarily close to zero, we can invoke the continuity 
of the piecewise linear function dj(r) to conclude that (II 181) also holds in the limit r — ► [1, 
Proof of Lemma 5], hence establishing the desired result. 

Appendix II 
Least favorable channel 

The result proved below is a generalization of [35, Theorem 2]. In what follows, we shall 
use U n , V n , and V n to denote the sets of all n x n unitary, doubly stochastic, and permutation 
matrices, respectively. 

Theorem 4: Consider the nonnegative real numbers A^, k = 1, . . . , m, and 9\, I = 1, . . . , n, 

1 /2 

with m < n, sorted in ascending order. Let the m x n matrix A be such that A(k, k) = \ k ' 
for k = 1, . . . , m and A(k, I) = for k ^ I. Denoting the set of all n x n unitary matrices by 
U n and letting the n x n matrix be given by = diagj^}^, we have 

min Tr (A Q Q^A") =VA, 9 m+1 _ k . 

Q€: tin 

k = l 

Proof: Straightforward manipulations show that 

m n 

min Tr (A Q Q H A H ) = min V \ k ^i\Q(k, l)\ 2 

k=l 1=1 

m n 

> min ^A fc J^D(M) (119) 

Dec " k =i i=i 

where D with D(i, j) = \Q(i, j)\ 2 is doubly stochastic whenever Q is unitary. The inequality 
in (II 191 ) is a consequence of enlarging the set of admissible matrices, i.e., U n C V n . Since the 
set of doubly stochastic matrices is a compact convex set, a linear function, such as the one 
in (II 191) . attains its minimum at an extreme point of this set [18, Appendix B]. By Birkhoff's 
Theorem [18, Theorem 8.7.1], the extreme points of the set of doubly stochastic matrices are 
the permutation matrices. Hence, 

m n m n 

/o=i i=i n k=\ i=i 

in 

= min X k 9 n / k ) 

k=l 



A fc 9 m+1 _ k . (120) 



k=l 
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The proof is concluded by noting that permutation matrices also belong to the set of unitary 
matrices, i.e., V n C U n , so that the minimum in (11201) is attained with equality. ■ 
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